Skip to content

Conversation

@davidheineman
Copy link
Member

Bump in-loop evals to v0.8.1. This will add "fast" MCQA, which performs MC tasks in 1 forward pass instead of 4 forward passes. (We extract the A/B/C/D logits from a single pass).

allenai/OLMo-in-loop-evals#8

This will make the MC tasks 4x faster, and produces the same numbers.

Also, added Java, Rust and C++ translated MBPP BPB.

@davidheineman davidheineman self-assigned this May 19, 2025
@davidheineman davidheineman requested a review from epwalsh May 19, 2025 17:27
Copy link
Member

@epwalsh epwalsh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sweet!

@epwalsh epwalsh merged commit 776778e into main May 19, 2025
15 checks passed
@epwalsh epwalsh deleted the fast-mc branch May 19, 2025 17:45
epwalsh pushed a commit that referenced this pull request May 27, 2025
Incorporate this one-line PR:
allenai/OLMo-in-loop-evals#12

TL;DR: #281 made in-loop RC and
BPB slower, this fixes that bug. **The RC/BPB in-loop evals run with
`ai2-olmo-eval~=0.8.0` are correct evals, just slower.**
TianhuaTao pushed a commit that referenced this pull request May 28, 2025
Bump in-loop evals to v0.8.1. This will add "fast" MCQA, which performs
MC tasks in 1 forward pass instead of 4 forward passes. (We extract the
A/B/C/D logits from a single pass).

allenai/OLMo-in-loop-evals#8

This will make the MC tasks 4x faster, and produces the same numbers.

Also, added Java, Rust and C++ translated MBPP BPB.
TianhuaTao pushed a commit that referenced this pull request May 28, 2025
Incorporate this one-line PR:
allenai/OLMo-in-loop-evals#12

TL;DR: #281 made in-loop RC and
BPB slower, this fixes that bug. **The RC/BPB in-loop evals run with
`ai2-olmo-eval~=0.8.0` are correct evals, just slower.**
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants